Clustering Co-occurrence of Maximal Frequent Patterns in Streams

نویسندگان

  • Edgar H. de Graaf
  • Joost N. Kok
  • Walter A. Kosters
چکیده

One way of getting a better view of data is by using frequent patterns. In this paper frequent patterns are (sub)sets that occur a minimal number of times in a stream of itemsets. However, the discovery of frequent patterns in streams has always been problematic. Because streams are potentially endless it is in principle impossible to say if a pattern is often occurring or not. Furthermore, the number of patterns can be huge and a good overview of the structure of the stream is lost quickly. The proposed approach will use clustering to facilitate the “online” analysis of the structure of the stream. A clustering on the co-occurrence of patterns will give the user an improved view on the structure of the stream. Some patterns might occur so often together that they should form a combined pattern. In this way the patterns in the clustering will approximate the largest frequent patterns: maximal frequent patterns. The number of (approximated) maximal frequent patterns is much smaller and combined with clustering methods these patterns provide a good view on the structure of the stream. Our approach to decide if patterns occur often together is based on a method of clustering where only the distance between pairs of patterns is known. This distance is the Euclidean distance between points in a 2-dimensional space, where the points represent the frequent patterns, or rather the most important ones. The coordinates are adapted when the records from the stream pass by, and reflect the real support of the corresponding pattern. In this setup the support is viewed as the number of occurrences in a time window. The main algorithm tries to maintain a dynamic model of the data stream by merging and splitting these patterns. Experiments show the versatility of the method.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Displaying Co-occurrences of Patterns in Streams for Website Usage Analysis

One way of getting a better view of data is by using frequent patterns. In this paper frequent patterns are (sub)sets that occur a minimal number of times in a stream of itemsets. However, the discovery of frequent patterns in streams has always been problematic. Because streams are potentially endless it is harder to say if a pattern is frequent or not. Furthermore, the number of patterns can ...

متن کامل

Clustering Co-occurrences of Maximal Frequent Patterns in Streams

One way of getting a better view of data is by using frequent patterns. In this paper frequent patterns are (sub)sets that occur a minimal number of times in a stream of itemsets. However, the discovery of frequent patterns in streams has always been problematic. Because streams are potentially endless it is in principle impossible to say if a pattern is often occurring or not. Furthermore, the...

متن کامل

A Study on Distributed Frequent Co-occurrence Patterns Algorithms across Multiple Data Streams

With the era of big data coming, the data streams are fast, continuous, and unbounded. The real-time requirements of the data streams processing results are very high. A large number of researches have been on Frequent Co-occurrence Patterns across multiple data streams. But those algorithms are centralized, which is worked on a single compute node. The memory of a single compute node and CPU c...

متن کامل

Mining Frequent Co-occurrence Patterns across Multiple Data Streams

This paper studies the problem of mining frequent co-occurrence patterns across multiple data streams, which has not been addressed by existing works. Co-occurrence pattern in this context refers to the case that the same group of objects appear consecutively in multiple streams over a short time span, signaling tight correlations between these objects. The need for mining such patterns in real...

متن کامل

Mining Frequent Patterns in Uncertain and Relational Data Streams using the Landmark Windows

Todays, in many modern applications, we search for frequent and repeating patterns in the analyzed data sets. In this search, we look for patterns that frequently appear in data set and mark them as frequent patterns to enable users to make decisions based on these discoveries. Most algorithms presented in the context of data stream mining and frequent pattern detection, work either on uncertai...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/0705.0588  شماره 

صفحات  -

تاریخ انتشار 2007